AITopics | hessian estimate

2a8009525763356ad5e3bb48b7475b4d-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 06:01:35 GMT

artificial intelligence, machine learning, second-order estimate, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

2a8009525763356ad5e3bb48b7475b4d-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 06:01:32 GMT

arxiv preprint arxiv, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Stochastic Newton Proximal Extragradient Method

Neural Information Processing SystemsFeb-17-2026, 05:01:21 GMT

However, these methods typically reach superlinear convergence only when the stochastic Hessian noise diminishes, increasing per-iteration costs over time.

artificial intelligence, iteration, machine learning, (20 more...)

Neural Information Processing Systems

Country: North America > United States > Michigan (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

2a8009525763356ad5e3bb48b7475b4d-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 00:14:28 GMT

experiment, hessian estimate, second-order estimate, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

UnifyingGradientEstimatorsforMeta-Reinforcement LearningviaOff-PolicyEvaluation

Neural Information Processing SystemsFeb-8-2026, 00:14:24 GMT

This is challenging from an implementation perspective, as repeatedly differentiating policy gradient estimates may lead to biased Hessian estimates.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)

Add feedback

Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation

Neural Information Processing SystemsDec-23-2025, 22:17:29 GMT

Model-agnostic meta-reinforcement learning requires estimating the Hessian matrix of value functions. This is challenging from an implementation perspective, as repeatedly differentiating policy gradient estimates may lead to biased Hessian estimates. In this work, we provide a unifying framework for estimating higher-order derivatives of value functions, based on off-policy evaluation. Our framework interprets a number of prior approaches as special cases and elucidates the bias and variance trade-off of Hessian estimates. This framework also opens the door to a new family of estimates, which can be easily implemented with auto-differentiation libraries, and lead to performance gains in practice.

meta-reinforcement learning, name change, unifying gradient estimator, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.33)

Add feedback

Stochastic Newton Proximal Extragradient Method

Neural Information Processing SystemsOct-10-2025, 12:16:35 GMT

However, these methods typically reach superlinear convergence only when the stochastic Hessian noise diminishes, increasing per-iteration costs over time.

convergence, inequality, iteration, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Michigan (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Policy Newton methods for Distortion Riskmetrics

Pachal, Soumen, Maniyar, Mizhaan Prajit, A, Prashanth L.

arXiv.org Artificial IntelligenceAug-12-2025

We consider the problem of risk-sensitive control in a reinforcement learning (RL) framework. In particular, we aim to find a risk-optimal policy by maximizing the distortion riskmetric (DRM) of the discounted reward in a finite horizon Markov decision process (MDP). DRMs are a rich class of risk measures that include several well-known risk measures as special cases. We derive a policy Hessian theorem for the DRM objective using the likelihood ratio method. Using this result, we propose a natural DRM Hessian estimator from sample trajectories of the underlying MDP. Next, we present a cubic-regularized policy Newton algorithm for solving this problem in an on-policy RL setting using estimates of the DRM gradient and Hessian. Our proposed algorithm is shown to converge to an $ε$-second-order stationary point ($ε$-SOSP) of the DRM objective, and this guarantee ensures the escaping of saddle points. The sample complexity of our algorithms to find an $ ε$-SOSP is $\mathcal{O}(ε^{-3.5})$. Our experiments validate the theoretical findings. To the best of our knowledge, our is the first work to present convergence to an $ε$-SOSP of a risk-sensitive objective, while existing works in the literature have either shown convergence to a first-order stationary point of a risk-sensitive objective, or a SOSP of a risk-neutral one.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2508.07249

Genre: Research Report (0.64)

Industry: Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation

Neural Information Processing SystemsOct-9-2024, 21:26:04 GMT

Model-agnostic meta-reinforcement learning requires estimating the Hessian matrix of value functions. This is challenging from an implementation perspective, as repeatedly differentiating policy gradient estimates may lead to biased Hessian estimates. In this work, we provide a unifying framework for estimating higher-order derivatives of value functions, based on off-policy evaluation. Our framework interprets a number of prior approaches as special cases and elucidates the bias and variance trade-off of Hessian estimates. This framework also opens the door to a new family of estimates, which can be easily implemented with auto-differentiation libraries, and lead to performance gains in practice.

meta-reinforcement learning, off-policy evaluation, unifying gradient estimator, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Stochastic Newton Proximal Extragradient Method

Jiang, Ruichen, Dereziński, Michał, Mokhtari, Aryan

arXiv.org Machine LearningJun-3-2024

Stochastic second-order methods achieve fast local convergence in strongly convex optimization by using noisy Hessian estimates to precondition the gradient. However, these methods typically reach superlinear convergence only when the stochastic Hessian noise diminishes, increasing per-iteration costs over time. Recent work in [arXiv:2204.09266] addressed this with a Hessian averaging scheme that achieves superlinear convergence without higher per-iteration costs. Nonetheless, the method has slow global convergence, requiring up to $\tilde{O}(\kappa^2)$ iterations to reach the superlinear rate of $\tilde{O}((1/t)^{t/2})$, where $\kappa$ is the problem's condition number. In this paper, we propose a novel stochastic Newton proximal extragradient method that improves these bounds, achieving a faster global linear rate and reaching the same fast superlinear rate in $\tilde{O}(\kappa)$ iterations. We accomplish this by extending the Hybrid Proximal Extragradient (HPE) framework, achieving fast global and local convergence rates for strongly convex functions with access to a noisy Hessian oracle.

assumption 2, inequality, iteration, (16 more...)

arXiv.org Machine Learning

2406.01478

Country: